专利摘要:
Techniques and systems are provided to process image data using one or more neural networks. For example, a patch of raw image data can be obtained. The patch can include a subset of pixels from a frame of raw image data, and the frame can be captured using one or more image sensors. The raw image data patch includes a single color component for each pixel in the pixel subset. At least one neural network can be applied to the patch of raw image data to determine a plurality of color component values for one or more pixels in the pixel subset. An output image data patch can then be generated based on applying at least one neural network to the raw image data patch. The output image data patch includes a subset of pixels from an output image data frame, and also includes the plurality of color component values for one or more pixels from the pixel subset of the image data frame. output. Applying at least one neural network causes the output image data patch to include fewer pixels than the raw image data patch. Multiple patches of the frame can be processed by at least one neural network in order to generate a final output image. In some cases, the frame patches can be overlaid, so that the final output image contains a complete image.
公开号:BR112020006869A2
申请号:R112020006869-1
申请日:2018-10-05
公开日:2020-10-06
发明作者:Hau Hwang;Tushar Sinha PANKAJ;Vishal Gupta;Jisoo Lee
申请人:Qualcomm Incorporated;
IPC主号:
专利说明:

[0001] [0001] This application claims the benefit of provisional patent application No. U.S. 62 / 571,182, filed on October 11, 2017, which is hereby incorporated by reference, in its entirety and for all purposes. FIELD
[0002] [0002] The present disclosure refers, in general, to image processing and, more specifically, to techniques and systems for performing image processing with the use of an image signal processor. BRIEF SUMMARY
[0003] [0003] In some examples, techniques and systems are described to perform image processing. Traditional image signal processors (ISPs) have separate discrete blocks that address the various partitions of the image-based problem space. For example, a typical ISP has discrete function blocks in which each applies a specific operation to the raw camera sensor data to create a final output image. Such functional blocks can include blocks for mosaic removal, noise reduction (noise removal), color processing, tone mapping, among many other image processing functions. Each of these function blocks contains many manually adjusted parameters, resulting in an ISP with a large number of manually adjusted parameters (for example, more than
[0004] [0004] A machine learning ISP that uses machine learning systems and methods to derive the mapping of raw image data captured by one or more image sensors to a final output image is described in this document. In some examples, raw image data may include a single color or gray scale value for each pixel location. For example, a sensor with a standard Bayer color filter matrix (or another suitable color filter matrix) with one of the red, green or blue filters at each pixel location can be used to capture raw image data with a single color by pixel location. In some cases, a device may include multiple image sensors to capture the raw image data processed by the machine learning ISP. The final output image can contain processed image data derived from raw image data. The machine learning ISP can use a neural network of convolutional filters (for example, convolutional neural networks (CNNs)) for the ISP task. The machine learning ISP's neural network may include several similar or repetitive convolutional filter blocks with a high number of channels (for example, an order of magnitude greater than the number of channels in an RGB or YCbCr image). The machine learning ISP functions as a single unit, rather than having individual function blocks that are present in a traditional ISP.
[0005] [0005] The ISP's neural network can include an input layer, multiple hidden layers and an output layer. The input layer includes the raw image data from one or more image sensors. Hidden layers can include convolutional filters that can be applied to input data, or to outputs from previous hidden layers to generate feature maps. Hidden layer filters can include weights used to indicate the importance of filter nodes. In some cases, the neural network may have a series of many hidden layers, with initial layers determining low-level characteristics of the image input data, and later layers building a hierarchy of more complex and abstract characteristics. The neural network can then generate the final output image (which makes up the output layer) based on the determined high level resources.
[0006] [0006] According to at least one example, a method of processing image data using one or more neural networks is provided. The method includes obtaining a patch of raw image data. The raw image data patch includes a subset of pixels from a raw image data frame that is used with the use of one or more image sensors. The raw image data patch includes a single color component for each pixel in the pixel subset. The method further includes applying at least one neural network to the patch of raw image data to determine a plurality of color component values for one or more pixels of the pixel subset. The method additionally includes generating an output image data patch based on applying at least one neural network to the raw image data patch. The output image data patch includes a subset of pixels from an output image data frame. The output image data patch also includes the plurality of color component values for one or more pixels in the subset of pixels in the output image data frame. Applying at least one neural network causes the output image data patch to include fewer pixels than the raw image data patch.
[0007] [0007] In another example, a device is provided to process image data using one or more neural networks that includes a memory configured to store video data and a processor. The processor is configured for and can obtain a patch of raw image data. The raw image data patch 30 includes a subset of pixels from a raw image data frame that is used with the use of one or more image sensors. The raw image data patch includes a single color component for each pixel in the pixel subset. The processor is additionally configured for and can apply at least one neural network to the raw image data patch to determine a plurality of color component values for one or more pixels in the pixel subset. The processor is additionally configured for and can generate an output image data patch based on the application of at least one neural network to the raw image data patch. The output image data patch includes a subset of pixels from an output image data frame. The output image data patch also includes the plurality of color component values for one or more pixels in the subset of pixels in the output image data frame. Applying at least one neural network causes the output image data patch to include fewer pixels than the raw image data patch.
[0008] [0008] In another example, a non-transitory computer-readable medium is provided which has stored instructions that, when executed by one or more processors, cause the one or more processors to: obtain a patch of raw image data, where the raw image data patch includes a subset of pixels from a raw image data frame captured using one or more image sensors, where the raw image data patch includes a single color component for each pixel in the pixel subset; apply at least one neural network to the raw image data patch to determine a plurality of color component values for one or more pixels in the pixel subset; and generate an output image data patch based on applying at least one neural network to the raw image data patch, the output image data patch includes a subset of pixels from an image data frame about to leave. The output image data patch also includes the plurality of color component values for one or more pixels in the subset of pixels in the output image data frame. Applying at least one neural network causes the output image data patch to include fewer pixels than the raw image data patch.
[0009] [0009] In another example, a device is provided to process image data using one or more neural networks. The device includes means to obtain a patch of raw image data. The raw image data patch includes a subset of pixels from a frame of raw image data captured using one or more image sensors. The raw image data patch includes a single color component for each pixel in the pixel subset. The apparatus further includes means for applying at least one neural network to the patch of raw image data to determine a plurality of color component values for one or more pixels of the pixel subset. The apparatus further includes means for generating an output image data patch based on the application of at least one neural network to the raw image data patch. The output image data patch includes a subset of pixels from an output image data frame. The output image data patch also includes the plurality of color component values for one or more pixels in the subset of pixels in the output image data frame. Applying at least one neural network causes the output image data patch to include fewer pixels than the raw image data patch.
[0010] [0010] In some respects, the raw image data frame includes image data from one or more image sensors filtered by a color filter matrix. In some examples, the color filter matrix includes a Bayer color filter matrix.
[0011] [0011] In some respects, applying at least one neural network to the raw image data patch includes applying one or more stride convolutional filters to the raw image data patch to generate reduced resolution data representative of the image data patch. gross image. For example, a convolutional filter with a stride can include a convolutional filter with a stride greater than one. Each stride convolutional filter of the one or more stride convolutional filters includes an array of weights.
[0012] [0012] In some respects, each stride convolutional filter of the one or more stride convolutional filters includes a plurality of channels. Each channel of the plurality of channels includes a different weight matrix.
[0013] [0013] In some respects, the one or more stride convolutional filters include a plurality of stride convolutional filters. In some examples, the plurality of stride convolutional filters includes: a first stride convolutional filter that has a first weight matrix, where the application of the first stride convolutional filter to the raw image data patch generates a first data set weighted representative of the raw image data patch, the first weighted data set having a first resolution; and a second stride convolutional filter that has a second weight matrix, where the application of the second stride convolutional filter generates a second set of weighted data representative of the raw image data patch, the second set of weighted data having a second resolution that is less than the first resolution.
[0014] [0014] In some respects, the methods, devices and the computer-readable medium described above additionally comprise: upwardly scaling the second weighted data set that has the second resolution to the first resolution; and generating combined weighted data representative of the raw image data patch by combining the second upwardly scaled set of weighted data with the first weighted data set that has the first resolution.
[0015] [0015] In some respects, the methods, apparatus and computer-readable medium described above further comprise applying one or more convolutional filters to the combined weighted data to generate resource data representative of the raw image data patch. Each convolutional filter of the one or more convolutional filters includes an array of weights.
[0016] [0016] In some respects, the methods, devices and the computer-readable medium described above additionally comprise: scaling up resource data to full resolution, and generating combined resource data representative of the raw image data patch by combining the resource data upwardly - “scaled with full resolution resource data, with full resolution resource data being generated by applying a convolutional filter to a full resolution version of the raw image data patch.
[0017] [0017] In some respects, generating the output image data patch includes applying a final convolutional filter to the resource data or resource data combined to generate the output image data.
[0018] [0018] In some ways, the methods,
[0019] [0019] In some respects, the plurality of color components per pixel include a red color component per pixel, a green color component per pixel and a blue color component per pixel.
[0020] [0020] In some respects, the plurality of color components per pixel includes a luma color component per pixel, a first chroma color component per pixel and a second chroma color component per pixel.
[0021] [0021] In some respects, at least one neural network jointly performs multiple image signal processor (ISP) functions.
[0022] [0022] In some respects, at least one neural network includes at least one convolutional neural network (CNN).
[0023] [0023] In some respects, at least one neural network includes a plurality of layers. In some ways, the plurality of layers is connected to a high dimensional representation of the raw image data patch.
[0024] [0024] This summary is not intended to identify major or essential resources of the claimed subject, nor is it intended to be used in isolation to determine the scope of the claimed subject. The subject must be understood with reference to the appropriate portions of the entire specification of this patent, any or all of the drawings and each claim.
[0025] [0025] What was previously mentioned, together with other resources and modalities, will become more evident by reference to the specification, claims and attached drawings below. BRIEF DESCRIPTION OF THE DRAWINGS
[0026] [0026] The patent or order file contains at least one drawing executed in color. Copies of this patent or patent application publication with colored drawing (or drawings) will be provided by the Office upon request and payment of the necessary fee.
[0027] [0027] Illustrative modalities of the present invention are described in detail below with reference to the following drawing figures:
[0028] [0028] Figure 1 is a block diagram that illustrates an example of an image signal processor, according to some examples;
[0029] [0029] Figure 2 is a block diagram that illustrates an example of a machine learning image signal processor, according to some examples;
[0030] [0030] Figure 3 is a block diagram that illustrates an example of a neural network, according to some examples;
[0031] [0031] Figure 4 is a diagram illustrating an example of training a neural network system for a machine learning image signal processor, according to some examples;
[0032] [0032] Figure 5 is a block diagram that illustrates an example of a convolutional neural network, according to some examples;
[0033] [0033] Figure 6 is a diagram that illustrates an example of a convolutional neural network of the machine learning image signal processor, according to some examples;
[0034] [0034] Figure 7 is a diagram that illustrates an example of a multidimensional input to the neural network of the machine learning image signal processor, according to some examples;
[0035] [0035] Figure 8 is a diagram that illustrates an example of multichannel convolutional filters of a neural network, according to some examples;
[0036] [0036] Figure 9 is a diagram that illustrates an example of a raw image patch, according to some examples;
[0037] [0037] Figure 10 is a diagram illustrating an example of a 2x2 filter of a convolutional neural network with hidden layer stride in the machine learning image signal processor neural network, according to some examples;
[0038] [0038] Figure 11A to Figure 11E are diagrams that illustrate an example of application of the 2x2 filter of the convolutional neural network with stride to the image patch, according to some examples;
[0039] [0039] Figure 12A is a diagram illustrating an example of an image output processed from the machine learning image signal processor, according to some examples;
[0040] [0040] Figure 12B is a diagram that illustrates another example of an image output processed from the machine learning image signal processor, according to some examples;
[0041] [0041] Figure 12C is a diagram that illustrates another example of an image output processed from the machine learning image signal processor, according to some examples; and
[0042] [0042] Figure 13 is a flowchart that illustrates an example of a process for processing image data using one or more neural networks, according to some modalities. DETAILED DESCRIPTION
[0043] [0043] Certain aspects and modalities of this disclosure are provided below. Some of these aspects and modalities can be applied independently and some of them can be applied in combination as may be evident to those skilled in the art. In the following description, for the purposes of explanation, specific details are presented in order to provide a complete understanding of the modalities of the invention. However, it will be evident that several modalities can be practiced without these specific details. Figures and description are not intended to be restrictive.
[0044] [0044] The following description provides only exemplary modalities, and is not intended to limit the scope, applicability or configuration of the disclosure. Instead, the subsequent description of the exemplary modalities will provide those skilled in the art with an enabling description to implement an exemplary modality. It should be understood that several changes can be made in the function and arrangement of elements without departing from the spirit and scope of the invention as presented in the attached claims.
[0045] [0045] Specific details are provided in the description below to provide a complete understanding of the modalities. However, it will be understood by those of ordinary skill in the art that the modalities can be practiced without these specific details. For example, circuits, systems, networks, processes and other components can be shown as components in the form of a block diagram in order not to obscure the modalities with unnecessary details. In other cases, well-known circuits, processes, algorithms, structures and techniques can be shown without unnecessary details in order to avoid obscuring the modalities.
[0046] [0046] In addition, it is noted that individual modalities can be described as a process that is shown as a flow chart, a flow diagram, a data flow diagram, a structural diagram or a block diagram. Although a flowchart can describe operations as a sequential process, many operations can be performed in parallel or simultaneously. In addition, the order of operations can be rearranged. A process is completed when its operations are completed, but there may be additional steps not included in a Figure. A process can correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination may correspond to a return from the function to the calling function or to the main function.
[0047] [0047] The term “computer-readable medium”
[0048] [0048] In addition, the modalities can be implemented by hardware, software, firmware, middleware, microcode, hardware description languages or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or segments of code to perform the necessary tasks (for example, a computer program product) can be stored in a computer-readable or machine-readable medium. A processor (or processors) can perform the necessary tasks.
[0049] [0049] Image signal processing is necessary to process raw image data captured by an image sensor to produce an output image that can be used for various purposes, such as for rendering and display, video encoding, video viewing computer, storage, among other uses. A typical image signal processor (ISP) obtains raw image data, processes the raw image data and produces a processed output image.
[0050] [0050] Figure 1 is a diagram illustrating an example of a standard ISP 108. As shown, an image sensor 102 captures raw image data. The photodiodes of the image sensor 102 capture varying shades of gray (or monochromatic). A color filter can be applied to the image sensor to provide raw input data filtered by color 104 (for example, which has a Bayer pattern). ISP 108 has discrete function blocks in which each applies a specific operation to the raw camera sensor data to create the final output image. For example, functional blocks can include blocks dedicated to mosaic removal, noise reduction (noise removal), color processing, tone mapping, among many others. For example, a ISP 108 tile removal function block can assist in generating a color image of output 109 using the raw input data filtered by color 104 by interpolating the color and brightness of pixels using adjacent pixels. This mosaic removal process can be used by ISP 108 to evaluate the color and brightness data for a given pixel, and to compare these values with the data for neighboring pixels. ISP 108 can then use the mosaic removal algorithm to produce an appropriate color and brightness value for the pixel. ISP 108 can perform several other image processing functions before providing the final output color image 109, such as noise reduction, sharpness, tone mapping and / or conversion between color spaces, autofocus, gamma, exposure, balance in white, among many other possible image processing functions.
[0051] [0051] The ISP 108 functional blocks require several adjustment parameters 106 that are manually adjusted to meet certain specifications. In some cases, more than 10,000 parameters need to be adjusted and controlled by a given ISP. For example, to optimize the color image of output 109, according to certain specifications, the algorithms for each function block need to be optimized by adjusting the adjustment parameters 106 of the algorithms. New functional blocks also need to be added continuously to deal with different cases that arise in space. The large number of manually adjusted parameters leads to very long and costly support requirements for an ISP.
[0052] [0052] A machine learning ISP that uses machine learning systems and methods to perform multiple ISP functions together is described in this document. Figure 2 is a diagram illustrating an example of a machine learning ISP 200. Machine learning ISP 200 can include an input interface 201 that can receive raw image data from an image sensor 202. In In some cases, image sensor 202 may include an array of photodiodes that can capture a frame 204 of raw image data. Each photodiode can represent “a pixel location and can generate a pixel value for that pixel location. Raw image data from photodiodes can include a single color or gray scale value for each pixel location in the frame
[0053] [0053] An illustrative example of a color filter matrix includes a Bayer standard color filter matrix (or Bayer color filter matrix), which allows the 202 image sensor to capture a pixel frame that has a Bayer pattern with one of the red, green, or blue filters at each pixel location. For example, the raw image patch 206 from the raw image data frame 204 has a Bayer pattern based on a Bayer color filter matrix that is used with the image sensor 202. The Bayer pattern includes a red filter, a blue filter and a green filter, as shown in the raw image patch pattern 206 shown in Figure 2. The Bayer color filter operates by filtering incoming light. For example, photodiodes with the green part of the pattern pass through the green color information (half the pixels), photodiodes with the red part of the pattern pass through the red color information (a quarter of the pixels), and photodiodes with the blue part of the pattern passes through the blue color information (a quarter of the pixels).
[0054] [0054] In some cases, a device may include multiple image sensors (which may be similar to image sensor 202), in which case the ISP machine learning operations described in this document can be applied to the raw image data obtained multiple image sensors. For example, a device with multiple cameras can capture image data using multiple cameras, and machine learning ISP 200 can apply ISP operations to raw image data from multiple cameras. In an illustrative example, a dual camera cell phone, tablet computer or other device can be used to capture larger images at wider angles (for example, with a wider field of view (FOV)), capture more light (resulting in more sharpness, clarity, among other benefits), to generate 360 degree video (for example, virtual reality) and / or to perform other enhanced functionality than that obtained by a single camera device.
[0055] [0055] The raw image patch 206 is provided and received by the input interface 201 for processing by machine learning ISP 200. Machine learning ISP 200 can use a neural network system 203 for the ISP task. For example, the neural network of the 203 neural network system can be trained to directly derive the mapping of raw image training data captured by image sensors to final output images. For example, the neural network can be trained using examples of various raw data inputs (for example, with color filtered patterns) and also using examples of the corresponding output images that are desired. Using the training data, the The neural network system 203 can learn a mapping of the raw input that is needed to obtain the output images, after which the ISP 200 can produce output images similar to those produced by a traditional ISP.
[0056] [0056] The neural network of the ISP 200 can include an input layer, multiple hidden layers and an output layer. The input layer includes the raw image data (for example, the raw image patch 206 or a full frame of raw image data) obtained by the image sensor 202. Hidden layers can include filters that can be applied to the image data raw images and / or the previous hidden layer outputs. Each of the hidden layer filters can include weights used to indicate the importance of the filter nodes. In an illustrative example, a filter can include a 3 x 3 convolutional filter that is convoluted around an input matrix, with each entry in the 3 x 3 filter having a unique weight value. In each convolutional (or stride) iteration of the 3 x 3 filter applied to the input matrix, a single weighted output resource value can be produced. The neural network can have a series of many hidden layers, with initial layers determining low-level characteristics of an input, and later layers building a hierarchy of more complex characteristics. The hidden layers of the ISP 200's neural network are connected to a high dimensional representation of the data. For example, the layers can include several blocks of repetitive convolutions with a high number of channels (dimensions). In some cases, the number of channels may be an order of magnitude greater than the number of channels in an RGB or YCbCr image. Illustrative examples provided below include repetitive convolutions with 64 channels each, providing a non-linear, hierarchical network structure that produces quality image details. For example, as described in more detail in the present document, an n number of channels (for example, 64 channels) refers to having a non-dimensional representation (for example, e4-dimensional) of the data at each pixel location. Conceptually, the n number of channels represents “mn resources” (for example, 64 resources) in the pixel location.
[0057] [0057] The neural network system 203 obtains the various multiple ISP functions together. A particular neural network parameter applied by the 203 neural network system has no explicit analog in a traditional ISP, and conversely, a particular functional block in a traditional ISP system has no explicit correspondence in the machine learning ISP. For example, the machine learning ISP performs signal processing functions as a single unit, rather than having individual function blocks that a typical ISP can contain to perform the various functions. Additional details of the neural network applied by the 203 neural network system are described below.
[0058] [0058] In some examples, machine learning ISP 200 also includes an optional preprocessing mechanism 207 to increase input data. Such additional input data (or magnification data) may include, for example, tone data, radial distance data, automatic white balance gain (AWB) data, a combination thereof, or any other additional data that may increase the pixels of the input data. By supplementing the raw input pixels, the input becomes a set of multidimensional values for each pixel location of the raw image data.
[0059] [0059] Based on the determined high level features, the 203 neural network system can generate an RGB 208 output based on the raw image patch 206. The RGB 208 output includes a red component, a green component and one component of blue color per pixel. The RGB color space is used as an example in this order. A person of ordinary skill in the art will notice that other color spaces can also be used, such as luma and chroma color components (YCbCr or YUV), or other suitable color components. The RGB output 208 can be output from the output interface 205 of the machine learning ISP 200 and used to generate an image patch on the final output image 209 (which makes up the output layer). In some cases, the pixel array on RGB output 208 may include a dimension smaller than that of the input 206 raw image patch. In an illustrative example, the raw image patch 206 may contain a 128 x 128 image pixel array raw (for example, in a Bayer pattern), while applying the repetitive convolutional filters of the 203 neural network system causes the RGB 208 output to include an 8 x 8 pixel array. The output size of the RGB 208 output that is smaller than the raw image patch 206 is a by-product of applying convolutional filters and designs the neural network system 203 to not fill the data processed through each of the convolutional filters. Because it has multiple convolutional layers, the outlet size is increasingly reduced. In such cases, the patches of the raw input image data frame 204 can be overlaid, so that the final output image 209 contains a complete image. The resulting final output image 209 contains processed image data derived from raw input data by the neural network system 203. The final output image 209 can be rendered for display, used for compression (or encoding), stored, or used for any other image-based purposes.
[0060] [0060] Figure 3 is an illustrative example of a neural network 300 that can be used by the machine learning ISP neural network system 203. An input layer 310 includes input data. The input data of the input layer 310 can include data representing the raw image pixels of a raw image input frame. The neural network 300 includes multiple hidden layers 312a, 312b to 312n. Hidden layers 312a, 312b to 312n include "n" number of hidden layers, where "n" is an integer greater than or equal to one. The number of hidden layers can be made to include as many layers as are necessary for the given application. The neural network 300 additionally includes an output layer 314 which provides an output resulting from the processing carried out by the hidden layers 312a, 312b to 312n. In an illustrative example, output layer 314 can provide an array of processed output pixels that can be used for an output image (for example, as a patch on the output image or as the complete output image).
[0061] [0061] The neural network 300 is a multilayer neural network of interconnected filters. Each filter can be trained to learn a feature representative of the input data. Information associated with filters is shared between the different layers and each layer retains information as the information is processed. In some cases, the neural network 300 may include a direct supply network, in which case there are no feedback connections, in which the network outputs are fed back into themselves. In some cases, network 300 may include a recurring neural network, which may have loops that allow information to be carried through nodes while reading the input.
[0062] [0062] In some cases, information can be exchanged between layers through node-to-node interconnections between the various layers. In some cases, the network may include a convolutional neural network, which may not link each node in one layer to each other in the next layer. In networks where information is exchanged between layers, nodes in input layer 310 can activate a set of nodes in the first hidden layer 312a. For example, as shown, each of the input nodes of the input layer 310 can be connected to each of the nodes of the first hidden layer 312a. Hidden layer 312 nodes can transform the information for each input node by applying activation functions (for example, filters) to that information. The information derived from the transformation can then be passed on and can activate the nodes of the next hidden layer 312b, which can perform its own designated functions. Exemplary functions include convolutional functions, downward scaling, upward scaling, data transformation and / or any other suitable functions. The output of the hidden layer 312b can then activate nodes of the next hidden layer, and so on. The output of the last hidden layer 312n can activate one or more nodes of the output layer 314, which provides a processed output image. In some cases, although nodes (for example, node 316) in neural network 300 are shown to have multiple output lines, a node has a single output and all lines shown as being emitted from a node represent the same value about to leave.
[0063] [0063] In some cases, each node or interconnection between nodes may have a weight that is a set of parameters derived from training the neural network 300. For example, an interconnection between nodes may represent information learned about the interconnected nodes. The interconnect can have an adjustable numerical weight that can be adjusted (for example, based on a training data set), which allows the neural network 300 to be adaptable to inputs and able to learn as more and more data is processed.
[0064] [0064] The neural network 300 is pre-trained to process the data resources in the input layer 310 using the different hidden layers 312a, 312b to 312n, in order to provide the output through the output layer 314. Again with Referring to Figure 4, a neural network (for example, neural network 300) implemented by a 403 neural network system from a machine learning ISP can be pre-trained to process raw image data input and output processed output images . The training data includes raw image data inputs 406 and reference output images 411 that correspond to raw image data inputs 406. For example, an output image of the reference output images 411 can include an output image end that was previously generated by a standard ISP (based on no machine learning) using raw image data input. Reference output images 411 may, in some cases, include images processed using the neural network system 403. Raw image data inputs 406 and reference output images 411 can be inserted into the neural network system 403, and the neural network (for example, neural network 300) can determine the mapping of each set of raw image data (for example, each patch of raw image data filtered by color, each frame of raw image data filtered by color or similar) for each corresponding final output image by adjusting the weights of the various hidden layer convolutional filters.
[0065] [0065] In some cases, the neural network 300 can adjust the weights of the nodes using a training process called retropropagation. Backpropagation can include a direct pass, a loss function, a reverse pass and a weight update. The direct pass, loss function, reverse pass, and parameter update are performed for a training iteration. the process can be repeated for a certain number of iterations for each set of training images until the network 300 is trained well enough, so that the weights of the layers are precisely adjusted.
[0066] [0066] The direct pass may include passing through the 300 network a frame or patch of raw image data and a corresponding output image or output patch that was generated based on the raw image data. The weights of the various filters of the hidden layers can be initially randomized before the neural network 300 is trained. The raw data input image may include, for example, a multidimensional array of numbers representing the raw image pixels filtered by the image's color. In one example, the matrix can include a 128 x 128 x 11 matrix of numbers with 128 rows and 128 columns of pixel locations and 11 input values per pixel location. Such an example is described in more detail below in relation to Figure 7.
[0067] [0067] For a first training iteration for network 300, the output may include values that do not give preference to any particular resource or node due to the fact that the weights are randomly selected at startup. For example, if the output is an array with multiple color components per pixel location, the output image may show an inaccurate color representation of the input. With the initial weights, the network 300 is unable to determine low level resources and, therefore, cannot make an accurate determination of what the color values may be. A loss function can be used to analyze the error at the output. Any suitable loss function definition can be used. An example of a loss function includes an average squared error (MSE). O is Etotar = 2: 2 MSE is defined as n (target output), which averages the quadratic differences (the actual response minus the predicted response (output), squared). The term n is the number of values in the sum. The loss can be adjusted to equal the value of Etotas.
[0068] [0068] The loss (or error) will be high for the first training data (raw image data and corresponding output images) since the actual values will be very different from the expected output. The training objective is to minimize the amount of loss, so that the expected output is equal to the training label. The neural network 300 can perform a reverse pass by determining which inputs (weights) contributed most to the loss of the network, and can adjust the weights so that the loss decreases and is eventually minimized.
[0069] [0069] In some cases, a derivative (or other suitable function) of the loss in relation to the weights (indicated as dL / dW, where W is the weights in a particular layer) can be computed to determine the weights that contributed most to the loss of the network. After the derivative is computed, a weight update can be performed by updating all the weights of the filters. For example, weights can be updated so that they change in the opposite direction of the gradient. The weight update can be 1 w = w-nÊ those indicated as dWw, where w indicates a weight, w, indicates the initial weight, e! indicates a learning rate. The learning rate can be adjusted to any suitable value, with a high learning rate that includes major weight updates and a lower value that indicates minor weight updates.
[0070] [0070] The neural network (for example, neural network 300) used by the machine learning ISP may include a convolutional neural network (CNN). Figure 5 is a diagram illustrating a high-level diagram of a CNN 500. The entry includes the raw image data 510, which can include a patch of a raw image data frame or a complete raw image data frame . CNN's hidden layers include a multichannel convolutional layer 512a and an activation unit (for example, a non-linear layer, exponential linear unit (ELU), or other suitable function). For example, raw image data can be passed through the series of hidden multi-channel convolutional layers and a convolutional layer activation unit to obtain an output image 514 on the output layer.
[0071] [0071] The first layer of the CNN 500 includes the convolutional layer 512a. Convolutional layer 512a analyzes raw image data 510. Each node in convolutional layer 512a is connected to a region of nodes (pixels) in the input image called a receptive field. The convolutional layer 512a can be considered as one or more filters (each filter corresponding to a different resource map), with each convolutional iteration of a filter being a node or neuron of the convolutional layer 512a. For example, the region of the input image that a filter covers in each convolutional iteration can be the receptive field for the filter. In an illustrative example, if the input image includes a 28 x 28 matrix, and each filter (and corresponding receptive field) is a 5 x 5 matrix, then there will be 24 x 24 nodes in the convolutional layer 512a. Each connection between a node and a receptive field for that node learns a weight and, in some cases, a general tendency so that each node learns to analyze its particular local receptive field in the input image. Each node in the 512a convolutional layer will have the same weights and trends (called a shared weight and a shared trend). For example, the filter has an array of weights (numbers) and a depth called a channel. Examples provided below include filter depths of 64 channels.
[0072] [0072] The convolutional nature of convolutional layer 512a occurs due to each node in the convolutional layer that is applied to its corresponding receptive field. For example, a filter of the convolutional layer 512a can start in the upper left corner of the input image matrix and can convolve around the input image. As noted above, each convolutional iteration of the filter can be considered a node of the convolutional layer 512a. In each convolutional iteration, the filter values are multiplied by a corresponding number of the original pixel values of the image (for example, the 5 x 5 filter matrix is multiplied by a 5 x 5 matrix of input pixel values in the left field. top of the input image matrix). The multiplications of each convolutional iteration can be added (or otherwise combined) to obtain a total sum for that iteration or node. The process is continued at a next location in the input image according to the receptive field of a next node in the convolutional layer 512a. For example, a filter can be moved by an amount of stride to the next receptive field. The amount of stride can be set to 1, 8 or another suitable amount, and it can be different for each hidden layer. For example, if the stride quantity is set to 1, the filter will be moved to the right by 1 pixel in each convolutional iteration. processing the filter at each unique location of the input volume produces a number that represents the filter results for that location, resulting in a sum total value that is determined for each node in the hidden convolutional layer 512a.
[0073] [0073] The mapping from the input layer to the convolutional layer 512a (or from a convolutional layer to a next convolutional layer) is called a resource map (or a channel as described in more detail below). A resource map includes a value for each node that represents the filter results at each location of the input volume. For example, each node in a resource map can include a weighted resource data value. The resource map can include an array that includes the various total sum values that result from each filter iteration in the input volume. For example, the feature map will include a 24 x 24 matrix if a 5 x 5 filter is applied to each pixel (a step amount of 1) in a 28 x 28 input image. The convolutional layer 512a can include multiple maps feature to identify multiple features in an image. The example shown in Figure 5 includes three feature maps. Using three resource maps (or channels), the convolutional layer 512a can provide a three-dimensional representation of the data at each pixel location of the final output image 514.
[0074] [0074] In some examples, an activation unit 512b can be applied after each convolutional layer 512a. The activation unit 512b can be used to introduce nonlinearity to a system that computed linear operations. An illustrative example of a non-linear layer is a rectified linear unit (ReLU) layer. Another example is an ELU. A RelU layer can apply the function f (x) = max (0, x) to all values in the input volume, which changes all negative activations to 0. RelU can thus increase the nonlinear properties of the network 500 without affecting the receptive fields of the convolutional layer 512a.
[0075] [0075] Figure 6 is a diagram illustrating a more detailed example of a convolutional neural network 600 from a machine learning ISP. The input to the 600 network is a 621 raw image patch (for example, which has a Bayer standard) from a raw image data frame, and the output includes a 630 RGB output patch (or a patch that has other color component representations, such as YUV). In an illustrative example, the network adopts raw 128 x 128 pixel image patches as input and produces RGB 8 x 8 x 3 patches as a final output. Based on the convolutional nature of the various convolutional filters applied by the 600 network, many of the pixel locations outside the 8 x 8 matrix from the raw image patch 621 are consumed by the 600 network to obtain the final 8 x 8 output patch. reduction in data from input to output occurs due to the amount of context needed to understand neighboring information to process a pixel. Having the larger input raw image patch 621 with all neighboring information and context is useful for processing and producing the smaller output RGB patch 630.
[0076] [0076] In some examples, based on the reduction in pixel locations from input to output, 128 x 128 raw image patches are designed so that they overlap in the raw input image. In such examples, the 8 x 8 exits are not overlapping. For example, for a first 128 x 128 raw image patch in the upper left corner of the raw image frame, a first 8 x 8 RGB output patch is produced. A next 128 x 128 patch in the raw image frame will be 8 pixels to the right of the last 128 x 128 patch and will thus be overlaid with the last 128 x 128 pixel patch. The next 128 x 128 patch will be processed over the 600 network. to produce a second 8 x 8 RGB output patch. The second 8 x 8 RGB patch will be placed next to the first 8 x 8 RGB output patch (produced using the 128 x 128 raw image patch) in the final output image. complete. Such a process can be performed until the 8 x 8 patches that make up a complete output image are produced.
[0077] [0077] Additional inputs 622 may also be provided in conjunction with the raw image patch 621. For example, additional inputs 622 may be provided by preprocessing mechanism 207 for neural network system 203. Additional inputs 622 may include any suitable supplementary data that may augment the color information provided by the raw image patch 621, such as tone data, radial distance data, automatic white balance gain (AWB) data, a combination thereof, or any other additional data that may increase the pixels of the input data. By supplementing the raw input pixels, the input becomes a set of multidimensional values for each pixel location of the raw image data.
[0078] [0078] Figure 7 is a diagram illustrating an example of a set of multidimensional inputs for a 731 raw image patch. The example shown in Figure 7 includes an input of size 128 x 128 x 11. For example, there are 11 inputs totals (dimensions) provided for each pixel location in the raw image patch 731. The 11 input dimensions include four dimensions for colors, including one dimension for red values 732a, two dimensions for green values 733a and green values 734a, and one dimension for blue values 735a. There are two green values 733a and 734a due to the Bayer pattern which has a green color in each row, and only one red value 732a and a blue value 735a due to the Bayer pattern which has each of the colors red and blue in all other rows. For example, as shown, the odd rows of the 731 raw image patch include red and green colors in all other pixels, and the even rows include green and blue colors in all other pixels. The white space between the pixels in each color dimension (the red values 732a, green values 733a, 734a, and blue values 735a) shows a spatial layout of these colors from the raw image patch 731. For example, if all values red 732a, green values 733a and 734a, and blue values 735a are combined together, the result may be the raw image patch 731.
[0079] [0079] The entry additionally includes a dimension for the measure of relative radial distance 736, which indicates the pixel distances from the center of the patch or frame. In some examples, the radial distance is the normalized distance from the center of the image. For example, the pixels in the four corners of the image can have a distance equal to 1.0, while the pixel in the center of the image can have a distance equal to O. In such examples, all other pixels can have distances between 0 and 1 based on the distance of those pixels from the central pixel. Such radial distance information can help supplement pixel data, since the behavior of the image sensor may be different in the center of an image versus the corners of the image. For example, the corners and edges of an image may be louder than the pixels in the center, as more light is falling from the corners of the image sensor lens, in which case more gain and / or noise reduction can be applied to the corner pixels. The entry also includes four dimensions for the square root of the colors. For example, a red square root dimension 732b, two green square root dimensions 733b and 734b, and a blue square root dimension 735b are provided. The use of red, green and blue square roots helps to better correlate the tone of the pixels. The last two dimensions are for the gain of the entire patch, including a dimension for automatic red white balance (AWB) gain 737 and a dimension for the blue AWB gain 738. AWB adjusts the gains of different color components (for example, example R, G and B) in relation to each other in order to make white objects white. the additional data helps the convolutional neural network 600 understand how to render the final RGB output patches.
[0080] [0080] Returning to Figure 6, and using the example in Figure 7 for illustrative purposes, the input data 128 x 128 x 11 is provided for the convolutional neural network 600 for processing. the network 600 convolutional filters provide a functional mapping of the input volume from the raw 128 x 128 621 image patch to the 8 x 8 630 RGB output patch. For example, the network 600 operates to apply the various adjusted convolutional filter weights during the training stage to input resources in different ways to finally trigger the 8x8 630 RGB output patch. Convolutional filters include CNNlIl with stride 623, CNN2 with stride 624, CNN3 with stride 625, CNN 631, CNN 632, the
[0081] [0081] Each channel of each convolutional filter (for example, one of the CNNs shown in Figure 7) has weights that represent a dimension or feature of an image. The plurality of channels included for each convolutional filter or CNN provide high dimensional representations of the data at each pixel (with each channel providing an additional dimension). As the raw image patch 621 is passed through several convolutional filter channels of the 600 network, weights are applied to transform these high dimensional representations as the data moves through the network, and to eventually produce the RGB patch final exit 630. In an illustrative example, a channel from one of the convolutional filter CNNs can include information to discover a vertical border at a pixel location. A next channel can include information on a horizontal border at each pixel location. A next channel can include information to discover the diagonal border. Other channels may include information related to color, noise, lighting, whiteness and / or any other suitable features of an image. Each channel can represent a pixel dimension, and can provide information on the pixel that the 600 network is capable of generating. In some cases, convolutional filters that work at lower resolutions (CNNl 623, CNN2 624 and CNN3 625), as described in more detail below, include information related to larger scale representations of the data, such as lower frequency colors for a general area, or other higher level resource. The other convolutional filters (CNN4 626, CNN5 627, CNN6 628 and CNN7 629) include information on smaller-scale representations of the data.
[0082] [0082] The concept of channels is described in relation to Figure 8. Figure 8 is a diagram that illustrates an exemplary structure of a neural network that includes a repetitive set of convolutional filters 802, 804 and
[0083] [0083] The volume 14 x 14 x 20 includes 14 rows and 14 columns of values due to the convolutional application of the 3 x 3 filters. For example, the 3 x 3 filters have a stride of 1, which means that the filters can be passed to each pixel location (for example, so that each pixel location is in the upper left corner of the matrix) for the first 14 rows and 14 columns of pixels in the 16 x 16 matrix (from the input) before the matrix of filter reaches the end of the block. The result is a 14 x 14 matrix of weighted values for each of the 20 channels.
[0084] [0084] P convolutional filter 804 includes a second CNN (shown as CNN2 in Figure 8) which includes 12 channels of 5 x 5 with filters and which has a stride of 1. The input for convolutional filter 804 includes volume 14 x 14 x 20 which is emitted from the convolutional filter
[0085] [0085] The convolutional filter 806 includes a third CNN (shown as CNN3 in Figure 8) that includes 3 channels of 7 x 7 filters that have a stride of 1 (no padding). The entry for the convolutional filter 806 includes the volume 14 x 14 x 12 emitted from the convolutional filter 804. The filter 7 x 7 for each of the 3 channels is convolutionally applied to the volume 14 x 14 x 12 to generate the patch 8 x 8 x 3 color values for an 808 output image. For example, the 8 x 8 x 3 patch can include an 8 x 8 pixel matrix for red, an 8 x 8 pixel matrix for green and an 8 x 8 pixel matrix for the color blue. The application of the three 7 x 7 filters to the input volume results in 1,764 parameters (7 x 7 x 12 x 3). The total parameters for such a network are 8,304 parameters.
[0086] [0086] Returning to Figure 6, the raw image patch 621 is at full resolution. The structure of the convolutional neural network 600 is so that the convolutional filters operate at different resolutions than the raw image patch 621. A scaled approach can be used to combine different weighted data resolutions that represent the raw data from the raw image patch 621. A hierarchical architecture can be useful for spatial processing. Noise reduction can be used as an illustrative example, in which case there are low frequency and high frequency noise. To effectively remove low frequency noise (noise that covers a large area of the image), very large space kkernels are needed. If a reduced resolution version of the image is present (for example, 1/64 resolution, 1/16 resolution, 1/4 resolution, or the like), then a smaller filter can be used at the reduced resolution to effectively apply a very spatial kernel large (for example, a 3 x 3 filter at 1 / 64th resolution is approximately a kernel (3 * 8) x (3 * 8)). The fact that the 600 network operates at lower resolutions thus allows efficient processing of lower frequencies. This process can be repeated by combining information from the lowest frequency / lowest resolution processing with the next highest resolution to work on the data at the next frequency / resolution. For example, using the scaled approach with different resolutions, the weighted values resulting from the different resolutions can be combined, and in some cases, the combined result can then be combined with another weighted data resolution that represents the raw image patch. 621. This can be iterated until the complete resolution (or other desired resolution) is formed.
[0087] [0087] Stride convolutional filters (eg stride CNNs) can be designed to generate the reduced resolution weighted outputs that represent the raw image patch data 621. Different sizes of filter matrices can be used for the filters stride convolutional, and each stride convolutional filter includes a stride value greater than l1. Examples of resolutions in which the 600 network can operate include 1/64 resolution, 1/16 resolution 1/4 resolution, full resolution or any other suitable resolution.
[0088] [0088] Figure 9, Figure 10 and Figure 11A to Figure 11E illustrate the application of a stride CNN. For example, Figure 9 is a diagram that illustrates an example of a raw image patch 900. The raw image patch 900 includes an array of pixels M x N, where Me N are integer values. The value of Me and the value of N can be the same or they can be different values. In the example shown in Figure 9, the value of M is equal to 8, and the value of N is equal to 8, making the raw image patch 900 an 8 x 8 matrix of 64 raw image pixels. The pixels in the 900 image patch are sequentially numbered from 0 to 63. In some cases, the raw image pixels in the 900 raw image patch may be in a Bayer pattern (not shown) or another suitable pattern. Figure 10 is a diagram that illustrates an example of a CNN x x y 1000 convolutional filter with stride on a machine learning ISP neural network. The filter 1000 illustrated in the Figure has an x value of 2 and a y value of 2, making filter 1000 a 2 x 2 filter with weights wO, wl, w2 and wi. Filter 1000 has a stride of 2, which means that filter 100 is applied in a convolutional manner to the raw image patch 900 shown in Figure 9 with a step amount of 2.
[0089] [0089] Figure 11A to Figure 11E are diagrams that illustrate an example of applying the 2 x 2 1000 filter to the raw image patch 900. As shown in Figure 11A, filter 1000 is applied first to the top leftmost pixels of the patch raw image 900. For example, the weights wO0O, wl, w2 and wàô3 of filter 1000 are applied to pixels 0, 1, 8 and 9 of the raw image patch
[0090] [0090] The filtering process for CNN with stride is continued at a next location in the raw image patch 900 by moving filter 1000 by the stride amount of 2 to the next receptive field. Due to the fact that the CNN stride amount with stride is set to 2, filter 1000 is moved to the right by two pixels, as shown in Figure 11C. When moved to the right by two pixels, the weights wO0O, wl, w2 and wi of filter 1000 are applied to pixels 2, 3, 10 and 11 of the raw image patch 900. For example, as shown in Figure 11D, the weight wO is multiplied by the pixel value 2, the wl weight is multiplied by the pixel value 3, the w2 weight is multiplied by the pixel value 10 and the weight wi is multiplied by the pixel value 11. The values (shown as WO0O * value (2), value Wl * (3), value W2 * (10), value W3 * (11)) that result from the multiplications can then be added (or otherwise combined) to generate an output B for that node or filter iteration 1000.
[0091] [0091] A similar process can be applied until filter 1000 has been convoluted around the entire 900 raw image patch. Figure 11E shows a resource map 1100 that results from filter 1000 that is applied to the 900 raw image patch. The resource map 1100 includes the total sum values A to O that result from each iteration of filter 1000 in the raw image patch. Resource map 1100 represents a set of weighted, low resolution resource data values that provide a multidimensional representation (when multiple channels are used) of the data in each pixel of the raw image patch
[0092] [0092] Returning to Figure 6, convolutional filters with stride of the convolutional neural network 600 include a CNN1l with stride 623, CNN2 with stride 624 and a CNN3 with stride 625. CNNlIl with stride 623 can include several channels of convolutional filters that operate to generate resource map matrices that contain “weighted data values (called resource data) that represent the raw image data from the raw image patch 621. The resource map matrices generated by CNN1 with stride 623 are a representation 1/64 resolution weighted raw image patch 621. The weighted values representative of the resource data can be obtained by converting the CNN1 623 weight filter matrix through the input volume 128 x 128 x 11 in a way that reduce the input dimensionality by 1/8 in each of the vertical and horizontal directions (resulting in a 1/64 total resolution reduction). For example, the input matrix 128 x 128 (with a depth of 11)
[0093] [0093] The result of CNNl with stride 623 is a reduced resolution set of weighted resource data values that provide a multidimensional representation of the raw image patch 621 resources. For example, weighted resource data values provide multidimensional representations of the data in each pixel of the raw image patch 621. In cases where each convolutional filter has 64 channels, the CNNl with stride 623 generates 64 16 x 16 resource map matrices of weighted values. After CNNl with stride 623, which performs stride convolutions as described above, a CNN8 631 is provided to process the output of CNN1 623. CNN8 631 can include a series of convolutions with a stride equal to 1. For example, 64 16 x 16 matrices from CNN1l 623 can be reduced to 64 8 x 8 matrices by CNN8 631. The 8 x 8 matrices from CNN8 631 can then be oversampled to a size of 16 x 16 before being combined with the CNN9 632 headquarters, as described below. A benefit of undersampling the data and then oversampling the data is to optimize the computation requirement. For example, the subsampled result is processed by CNN8 631 in order to gather information at the lowest resolution. If the data is not subsampled first, the use of larger filters may be necessary to obtain a similar result at the highest resolution.
[0094] [0094] In parallel with CNN1 with stride 623, a CNN2 with stride of resolution 1/16 624 produces 64 arrays of resource map of resolution 1/16 of weighted values. In an illustrative example, CNN 2 624 can first apply a 2 x 2 filter matrix with a stride of 2 to the 128 x 128 x 11 volume of raw image data (associated with the 621 raw image patch) to generate 64 x matrices 64 of weighted resource data values. Another 2 x 2 filter matrix can be applied to the weighted values 64 x 64 matrix to generate a 32 x 32 resource map matrix of resource data values. In another illustrative example, a 4 x 4 matrix can be applied to a stride of 4 to the input raw image patch 128 x 128 621 to reduce the matrix from 128 x 128 to 32 x 32. Any other size and size filter matrix amount of stride can be used to generate a resource map matrix of weighted resource data values that is 1/16 the size of the raw image patch 621. CNN2 with stride 624 has a plurality of channels (for example, 64 or other suitable value), and will apply all 64 different filter matrices. When 64 channels are used, the result will be 64 different 32 x 32 matrices of weighted values, with each matrix representing a different representation of the raw image patch 621 data at quarter resolution.
[0095] [0095] After CNN2 with stride 624, a CNN9 632 is provided to process the output of CNN2 624. CNN9 632 is similar to CNN8 631, and may include a series of convolutions with a stride equal to 1. For example, 32 x 32 matrices of CNN2 624 size can be reduced to 16 x 16 matrices by CNN9 632. As shown, the 64 resource map matrices of CNN8 631 weighted resource data values are combined with 64 matrix map matrices. 16 x 16 resource of CNN9 632 weighted resource data values. As noted above, CNNl 623 size 1l6xl6 arrays can be reduced to 8 x 8 arrays by CNN8 631. To combine 8 x 8 lower resolution arrays with the larger 16 x 16 matrices, the lower resolution data needs to be oversampled so that the values in the CNN8 631 and CNN9 632 matrices can be combined. In some examples, the 8 x 8 matrices of CNN8 631 can be oversampled by increasing the matrix to a 16 x 16 size and then doubling the values of the 8 x 8 matrices horizontally and vertically, so that the 16 x matrix 16 upwardly scaled has values on each node. The weighted values of the upwardly scaled 16 x 16 matrices can then be added to the weighted values of the CNN9 632 16 x 16 matrices to produce the combined weighted values 16 x 16 matrices. Because the number of channels for each convolutional filter (for example, CNN8 631 and CNN9 632) is the same, the number of dimensions (which corresponds to the number of channels) line up to be added together.
[0096] [0096] The 64 resource map matrices 16 x 16 combined of weighted values (based on the combination of the matrices of CNN8 631 and CNN9 632) are then processed by CNN4 626. CNN4 626, CNN5 627, CNNG6 628 and CNN7 629 can include the same number of channels (with weights representing different data dimensions), such as the 64 channels used in the examples above. CNN4 626, CNN5 627, CNN6 628 and CNN7 629 also have a stride equal to 1 and therefore are not called stride filters. For example, CNN4 626, CNN5 627, CNN6 628 and CNN7 629 can include 64 channels of 3 x 3 filters that have a stride of 1.
[0097] [0097] As noted above, the 64 16 x 16 resource map matrices combined with weighted values are processed by CNN4 626. CNN4 626 processes these 16 x 16 matrices with a series of convolutional layers (with stride equal to 1) up to the arrays are reduced to 8 x 8. The output of CNN4 626 is then oversampled from 8 x 8 to 16 x 16 dimensional arrays before being combined with the arrays of CNN10 633.
[0098] [0098] CNN3 with stride 625 processes raw image patch 621 in a way that reduces the resolution from 128 x 128 to 64 x 64. In an illustrative example, CNN3 625 can apply a 2 x 2 filter matrix with a stride from 2 to 128 x 128 x 11 volume of raw image data to generate 64 x 64 resource map arrays of weighted resource data values. After CNN3 with stride 625, a CNN10 633 is provided to process the output of CNN3
[0099] [0099] The combined 16 x 16 resource map matrices are then processed by CNN5 627 to produce additional weighted matrix sets. The output of CNN5 627 is oversampled for full resolution and the full resolution feature map arrays with weighted full resolution feature data values are combined with a set of full resolution feature map arrays issued from CNN6 628. The CNN6 628 operates on the full resolution version of the raw image patch 621. The full resolution CNN6 628 can be used so that the 600 network can generate a full resolution pixel RGB output. The full resolution can be used in cases where it is desired or it is important that the application provides a full resolution image. Full resolution CNN6 628 is required to produce full image resolution on the output. For applications that need only a partial resolution image, the full resolution layer (CNN6 628) can be removed or omitted from the 600 network.
[0100] [0100] The combined full resolution feature map matrices are then processed by CNN7 629 to produce the final RGB output patch 630 which is based on the raw image patch 621. The RGB output patch 630 can be determined based on on data or resources —multidimensional determined by convolutional filters other than convolutional neural network 600. Using the example above, convolutional filters from network 600 provide a functional mapping (based on the various weights of convolutional filters) of the input volume of the patch 128 x 128 621 raw image for the 8 x 8 630 RGB output patch. In some examples, the 630 RGB output patch includes a red color component, a green color component and a blue color component per pixel. A person of ordinary skill in the art will note that color spaces other than RGB can also be used, such as luma and chroma color components (YCbCr or YUV) (for example, in which case the plurality of color components per pixel includes a luma color component per pixel, a first chroma color component per pixel, and a second chroma color component per pixel), or other suitable color components. In some examples (not shown in Figure 6), the output may be a monochrome image patch, in which the 600 network performs noise reduction, tone mapping, or another ISP-based function.
[0101] [0101] As described above, the pixel array in the 630 RGB output patch can include a dimension smaller than the raw input image patch dimension
[0102] [0102] As noted above, the patches of an input frame of raw image data can be set so that they are overlaid, which allows the complete output image to contain a complete image even in view of the reduced dimensionality of the input. to the exit. The resulting final output image contains processed output image patches derived from raw input data by the convolutional neural network 600. The output image patches are arranged close to each other in an overlapping manner to produce the final output image ( for example, the first output image patch, followed by the second output image patch, and so on). The final output image can be rendered for display, used for compression (or encoding), stored, or used for any other image-based purposes.
[0103] [0103] In some cases, the 621 full resolution raw image patch can be cut before being processed by one or more of the convolutional filters of the convolutional neural network 600. For example, to obtain the reduced size output (for example, to move from a 128 x 128 input to an 8 x 8 output), more convolutional layers are needed to process the larger inputs at full resolution. The raw image patch 621 can be cut by removing some of the pixels at the edges of the raw image patch 621 before applying convolutional filters to patch 621. Cutting is optional on each convolutional filter based on the needs of the 600 network. an illustrative example, the raw image patch 621 can be cut to the full resolution CNN6 628 described above which produces a full resolution feature map matrix. For example, due to the fact that the 630 final output RGB patch is in a reduced size (for example, an 8 x 8 matrix), all pixel location inputs for the 128 x 128 full resolution input may not be required to provide the pixel level context for the 8 x 8 center of the 621 raw image patch. The pixel neighborhood in the 621 full resolution raw image patch that is likely to impact the details of the final 8 x 8 output is more near the 8 x 8 pixel set around the center of the 621 raw image patch. In such cases, the 621 raw image patch can be cut so that a neighborhood of fewer pixels surrounds the central 8 x 8 portion of the image patch. 621 raw image. In an illustrative example, a 32 x 32 pixel array around the center can be cut from the 621 full resolution raw image patch.
[0104] [0104] In some cases, the 600 network can be designed to avoid normalization and batch grouping and is designed to have no padding. For example, the network 600 intentionally has no batch normalization layers and grouping layers, and has no padding in some cases. The grouping can be excluded from the 600 network due to the fact that grouping layers can interrupt the resolution of an image. For image signal processing functions, a highly detailed result is desired at a particular resolution, in which case the grouping is not useful. Normalization layers can also be removed. At different layers, the batch standard that is typically performed at some network scales and shifts data at the particular layer to provide a better range of data for the next layers to be processed. Such layers of normalization can be useful for classification problems, due to the fact that classification systems try to find out if a particular resource or class is present, so if the output of a layer is scaled and shifted, the result is still is preserved due to the fact that the data is scaled and shifted by the same amount. However, for the regression problem that the machine learning ISP neural network performs to move from a continuous value input to a continuous value output, the way in which different pixels are shifted and scaled in relation to each other cannot be arbitrary. For example, the colors of the image need to be well preserved, the different details in an image patch need to be preserved to make sense in the larger scheme of the entire image, among others. For these and other reasons, the standardization layers can be omitted from the 600 network.
[0105] [0105] The 600 network also does not include a fully connected layer, and instead uses a CNN (CNN7 629) as the last layer in the 600 network. An advantage of the fully convolutional network (with no fully connected layer) is that the network is not limited by size. For example, CNNs are translationally invariant. Due to the fact that processing on the 600 network is translationally invariable, the same learned filters can be applied to larger or smaller input sizes. For example, if an input size needs to be 256 x 256, the same parameters as the 128 x 128 network in Figure 6 can be used. Another advantage of the fully convolutional network is that the fully connected layers have much more parameters and computation compared to using only convolutional layers, as shown in Figure 6. For example, if a fully connected layer would generate the output RGB patch 630, the number of parameters could be much greater than if only CNNs were used, as shown in Figure 6.
[0106] [0106] As noted above, the RGB output patches are arranged side by side together to produce the final output image. Since no filling is done in the data, seams in the final output image can be avoided. For example, filling in the data can create artificial information at the edges, which in turn can cause seams. The network uses filtering operations to make the width and / or height larger, which allows the network to work on the actual data in the image, instead of filling in the data.
[0107] [0107] Using machine learning to perform ISP functions, the ISP becomes customizable. For example, different functionalities can be developed and applied by presenting examples of targeted data and changing network weights through training. The machine-learning-based ISP can also get quick feedback for updates, compared to directly connected or heuristic-based ISPs. In addition, a machine learning-based ISP removes the time-consuming task of adjusting the tuning parameters that are required for standard ISPs. For example, there is a significant amount of effort and personnel used to manage the ISP infrastructure. A holistic development can be used for the machine learning ISP, during which the end-to-end system is directly optimized and created. This holistic development is in contrast to the piece-by-piece development of standard ISP function blocks. Imaging innovation can also be accelerated based on the machine learning ISP. For example, a customizable machine learning ISP unlocks many possibilities for innovation, allowing developers and engineers to more quickly lead, develop and adapt solutions for working with innovative sensors, lenses, camera arrays, and more.
[0108] [0108] Figure 13 is a flow chart that illustrates an example of a 1300 process for processing image data using one or more neural networks using the techniques described in this document. In block 1302, process 1300 includes obtaining a patch of raw image data. The raw image data patch includes a subset of pixels from a frame of raw image data captured using one or more image sensors. The raw image data patch includes a single color component for each pixel in the pixel subset. In some examples, the raw image data frame includes image data from one or more image sensors filtered by a color filter matrix. The color filter matrix can include any suitable color filter,
[0109] [0109] In block 1304, process 1300 includes applying at least one neural network to the patch of raw image data to determine a plurality of color component values for one or more pixels in the pixel subset. In block 1306, process 1300 includes generating an output image data patch based on applying at least one neural network to the raw image data patch. At least one neural network can be applied to other patches in the raw data entry frame. The output image data patch includes a subset of pixels from an output image data frame. The patch also includes the plurality of color component values for one or more pixels in the subset of pixels in the output image data frame. At least one neural network is designed to reduce the amount of data from the incoming raw image data patch. For example, applying at least one neural network causes the output image data patch to include fewer pixels (or pixel locations) than the raw image data patch. For example, using the examples above that use a 128 x 128 input patch of raw input data, an output image patch can include an 8 x 8 pixel patch that will be part of an output image. As noted above, the patches of the raw image data input frame can be set so that they are overlaid, which allows the output image to contain a complete image even in view of the reduced dimensionality of the input to the output. The output image patches can be arranged next to each other in an overlapping manner to produce the final output image. The final output image can be rendered for display, used for compression (or encoding), stored, or used for any other image-based purposes.
[0110] [0110] In some implementations, applying at least one neural network to the raw image data patch includes applying one or more convolutional filters with stride to the raw image data patch to generate reduced resolution data representative of the image data patch. gross image. For example, a convolutional filter with a stride can include a convolutional filter with a stride greater than one. Each stride convolutional filter of the one or more stride convolutional filters includes an array of weights. Examples of stride convolutional filters include CNN] with stride 623, CNN2 with stride 624 and CNN3 with stride 625 described above in relation to Figure 6. In some examples, each stride convolutional filter from one or more stride convolutional filters can include a plurality of channels. Each channel of the plurality of channels includes a different weight matrix. Channels are high dimensional representations of the data at each pixel. For example, with the use of the plurality of channels, the neural network can transform these high dimensional representations as the data moves through the neural network.
[0111] [0111] As noted above, the one or more stride convolutional filters may include a plurality of stride convolutional filters. For example, the plurality of stride convolutional filters includes a first stride convolutional filter that has a first weight matrix and a second stride convolutional filter that has a second weight matrix. Applying the first stride convolutional filter to the raw image data patch generates a first set of weighted data representative of the raw image data patch. The first set of weighted data having a first resolution. The application of the second convolutional filter with stride generates a second set of weighted data representative of the raw image data patch. The second set of weighted data has a second resolution that is less than the first resolution. In some cases, the second stride convolutional filter can be applied to the raw image data patch to generate the second weighted data set. Such an example is shown in Figure 6, where CNN2 with stride 624 is an example of the first convolutional filter with stride and CNNIl with stride 623 is an example of the second convolutional filter with stride. In other cases, the second stride convolutional filter can generate the second weighted data set from an output of another convolutional filter. In an illustrative example, the first weighted data set that has the resolution can be formed by the first stride convolutional filter, and the second stride convolutional filter can be concatenated after the first stride convolutional filter to form the second weighted data set that has the second resolution.
[0112] [0112] In some cases, the 1300 process includes scaling up the second weighted data set that has the second resolution to the first resolution, and generating combined weighted data representative of the raw image data patch by combining the second set upwardly. scaled data weighted with the first weighted data set that has the first resolution. Using the example above, the output of CNN1Il with stride 623 (like the second convolutional filter with stride) can be oversampled so that the values of CNN1 with stride 623 can be combined with the output of CNN2 with stride 624 (as the first stride convolutional filter). In some cases, a first convolutional filter with a stride equal to 1 can be placed in the network after the first convolutional filter with stride and a second convolutional filter with a stride equal to 1 can be placed in the network after the second convolutional filter with stride. In such cases, the output matrix of the second convolutional filter with a stride of 1 can be scaled upwards, and the output matrix of the second convolutional filter can be combined with the output matrix of the first convolutional filter with a stride of 1 An example of the first convolutional filter with a stride of 1 is CNN9 632 shown in Figure 6, and an example of the second convolutional filter with a stride of 1 is CNN8 631.
[0113] [0113] In some examples, the 1300 process may include applying one or more convolutional filters to the combined weighted data to generate resource data representative of the raw image data patch. Each convolutional filter of the one or more convolutional filters includes an array of weights. Each of the convolutional filters can also include a stride of 1, in which case the convolutional filters are not stride filters (they do not have a stride greater than 1).
[0114] [0114] In some cases, the 1300 process may include scaling up resource data to a full resolution, and generating combined resource data representative of the raw image data patch by combining upwardly scaled resource data with resource data full resolution. Full resolution feature data is generated by applying a convolutional filter to a full resolution version of the raw image data patch.
[0115] [0115] In some examples, generating the output image data patch includes applying a final convolutional filter to the resource data or resource data combined to generate the output image data. In some cases, at least one neural network does not include a fully connected layer. For example, a fully connected layer is not used before or after the final convolutional filter. In some cases, at least one neural network does not include any grouping layers. For example, a grouping layer is not used before or after the final convolutional filter.
[0116] [0116] In some cases, the plurality of color components per pixel include a red color component per pixel, a green color component per pixel and a blue color component per pixel. In some cases, the plurality of color components per pixel includes a luma color component per pixel, a first chroma color component per pixel and a second chroma color component per pixel.
[0117] [0117] In some cases, at least one neural network jointly performs multiple image signal processor (ISP) functions. In some instances, at least one neural network includes at least one convolutional neural network (CNN). In some cases, the at least one neural network includes a plurality of layers. In some ways, the plurality of layers is connected to a high dimensional representation of the raw image data patch.
[0118] [0118] In some examples, process 1300 can be performed by a computing device or device, such as the machine learning ISP 200 shown in Figure 2. In some cases, the computing device or device may include a processor , microprocessor, microcomputer, or other component of a device that is configured to perform the 1300 process steps. In some examples, the device or computing device may include a camera configured to capture video data (for example, a video stream ) that includes video frames. In some cases, the computing device may include a camera device that may include a video codec. In some instances, a camera or other capture device that captures video data is separate from the computing device, in which case the computing device receives the captured video data. The computing device may additionally include a network interface configured to communicate video data. The network interface can be configured to communicate data based on Internet Protocol (IP), or any other suitable type of data.
[0119] [0119] Process 1300 is illustrated as logical flow diagrams, whose operation represents a sequence of operations that can be implemented in hardware, computer instructions or a combination thereof. In the context of computer instructions, operations represent computer executable instructions stored on one or more computer-readable media that, when executed by one or more processors, perform the aforementioned operations. Generally, computer executable instructions include routines, programs, objects, components, data structures, and the like that perform particular functions or implement particular data types. The order in which operations are described is not intended to be construed as a limitation, and any number of operations described can be combined in any order and / or in parallel to implement the processes.
[0120] [0120] Additionally, process 1300 can be performed under the control of one or more computer systems configured with executable instructions and can be implemented as code (for example, executable instructions, one or more computer programs, or one or more applications ) that collectively runs on one or more processors, by hardware, or combinations thereof. As noted above, the code can be stored on a computer-readable or machine-readable storage medium, for example, in the form of a computer program that comprises a plurality of instructions executable by one or more processors. The computer-readable or machine-readable storage medium may be non-transitory.
[0121] [0121] In the aforementioned description, aspects of the application are described with reference to the specific modalities of the same, however those skilled in the art will recognize that the invention is not limited to this. Thus, although illustrative modalities of the application have been described in detail in this document, it should be understood that inventive concepts can be otherwise incorporated and employed in a variety of ways, and that the attached claims are intended to be constructed in a different way. to include such variations, except as limited by the prior art. Various features and aspects of the invention described above can be used individually or jointly. In addition, the modalities can be used in any number of environments and applications in addition to those described in this document without departing from the spirit and broader scope of the specification. The specification and drawings should therefore be considered as illustrative rather than restrictive. For illustrative purposes, the methods have been described in a particular order. It should be noted that in alternative modalities, the methods can be carried out in an order different from that described.
[0122] [0122] When components are described as being "configured to" perform certain operations, such configuration can be performed, for example, by designing electronic circuits or other hardware to perform the operation, by programming programmable electronic circuits (for example, microprocessors, or other suitable electronic circuits) to perform the operation, or any combination thereof. A person of ordinary skill in the art will note that symbols less than (“<”) and greater than (“>”) or terminology used in this document can be replaced by symbols less than or equal to (“<<”) and greater than or equal to (“2”), respectively, without departing from the scope of this description.
[0123] [0123] The various logic blocks, modules, circuits and illustrative algorithm steps described in connection with the modalities disclosed in this document can be implemented as electronic hardware, computer software, firmware, or combinations thereof. To clearly illustrate this interchangeability of hardware and software, several components, blocks, modules, circuits and illustrative steps have been described above generally in terms of their functionality. The possibility of such functionality being implemented as hardware or software, depends on the particular application and the design restrictions imposed on the general system. Elements skilled in the art can implement the functionality described in various ways for each particular application, however such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
[0124] [0124] The techniques described in this document can also be implemented in electronic hardware, computer software, firmware, or any combination thereof. Such techniques can be implemented in any of a variety of devices, such as general purpose computers, wireless communication device devices or multi-purpose integrated circuit devices, including application in wireless communication device devices and other devices . Any features described as modules or components can be implemented together in an integrated logic device or separately as discrete yet interoperable devices. If implemented in software, the techniques can be performed, at least in part, by a computer-readable data storage medium that comprises program code that includes instructions that, when executed, perform one or more of the methods described above. The computer-readable data storage medium may form part of a computer program product that may include packaging materials. The computer-readable medium may comprise memory or data storage media, such as random access memory (RAM), such as synchronous dynamic random access memory (SDRAM), read-only memory (ROM), non-random access memory volatile (NVRAM), memory. “electrically erasable programmable read-only (EEPROM), FLASH memory, magnetic or optical data storage media and the like. The techniques, in an additional or alternative way, can be performed, at least in part, by a computer-readable communication medium that carries or communicates program code in the form of instructions or data structures that can be accessed, read and / or performed by a computer, such as signals or propagated waves.
[0125] [0125] the program code may be executed by a processor, which may include one or more processors, such as digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICsS), programmable logic arrangements in field (FPGAs) or other equivalent integrated or discrete logic circuitry. Such a processor can be configured to perform any of the techniques described in this disclosure. A general purpose processor can be a microprocessor; however, in the alternative, the processor can be any conventional processor, controller, microcontroller or state machine. A processor can also be implemented as a combination of computing devices, for example, a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors together with a DSP core or any other such configuration.
Consequently, the term “processor”, as used in this document, can refer to any of the aforementioned structures, any combination of structures previously mentioned, or any other structure or apparatus suitable for implementing the techniques described in this document.
In addition, in some respects, the functionality described in this document may be provided within dedicated software modules or hardware modules configured for encoding or decoding, or incorporated into a combined video encoder-decoder (CODEC).
权利要求:
Claims (30)
[1]
1. Method of processing image data using one or more neural networks, the method comprising: obtaining a patch of raw image data, and the patch of raw image data includes a subset of pixels from one raw image data frame captured using one or more image sensors, where the raw image data patch includes a single color component for each pixel in the pixel subset, apply at least one neural network to the image patch raw image data to determine a plurality of color component values for one or more pixels in the subset of pixels, and generate an output image data patch based on applying at least one neural network to the image data patch raw, the output image data patch includes a subset of pixels from an output image data frame and includes the plurality of color component values for one or more pixels from the pixel subset of the data frame s output image, where the application of at least one neural network causes the output image data patch to include fewer pixels than the raw image data patch.
[2]
A method according to claim 1, wherein the raw image data frame includes image data from the one or more image sensors filtered by a color filter matrix.
[3]
A method according to claim 2, wherein the color filter matrix includes a Bayer color filter matrix.
[4]
4, Method according to claim 1, wherein applying at least one neural network to the raw image data patch includes: applying one or more stride convolutional filters to the raw image data patch to generate reduced resolution data representative of the raw image data patch, with each stride convolutional filter of one or more stride convolutional filters including a weight matrix.
[5]
A method according to claim 4, wherein each stride convolutional filter of the one or more stride convolutional filters includes a plurality of channels, wherein each channel of the plurality of channels includes a different weight matrix.
[6]
6. The method of claim 4, wherein the one or more stride convolutional filters include a plurality of stride convolutional filters, the plurality of stride convolutional filters including: a first stride convolutional filter having a first weight matrix, in which the application of the first stride convolutional filter to the raw image data patch generates a first set of weighted data representative of the raw image data patch, the first weighted data set having a first resolution ; and a second stride convolutional filter that has a second weight matrix, where the application of the second stride convolutional filter generates a second set of weighted data representative of the raw image data patch, the second set of weighted data having a second resolution that is less than the first resolution.
[7]
Method according to claim 6, which further comprises: scaling up the second weighted data set having the second resolution to the first resolution; and generating combined weighted data representative of the raw image data patch by combining the second upwardly scaled set of weighted data with the first weighted data set that has the first resolution.
[8]
8. Method according to claim 7, which further comprises: applying one or more convolutional filters to the combined weighted data to generate resource data representative of the raw image data patch, with each convolutional filter of one or more convolutional filters includes an array of weights.
[9]
9. Method according to claim 8, which further comprises: scaling the resource data upward to full resolution, and generating combined resource data representative of the raw image data patch by combining the resource data upwardly scaled with full resolution feature data, with full resolution feature data being generated by applying a convolutional filter to a full resolution version of the raw image data patch.
[10]
10. The method of claim 9, wherein generating the output image data patch includes: applying a final convolutional filter to the resource data or the combined resource data to generate the output image data.
[11]
11. Method according to claim 1, which further comprises: obtaining additional data to augment the obtained raw image data patch, the additional data including at least one or more of the tone data, radial distance data or automatic white balance gain (AWB) data.
[12]
12. The method of claim 1, wherein the at least one neural network includes a plurality of layers, and wherein the plurality of layers is connected to a high dimensional representation of the raw image data patch.
[13]
13. Apparatus for processing image data using one or more neural networks comprising: a memory configured to store image data; and a processor configured to: obtain a patch of raw image data, with the patch of raw image data including a subset of pixels from a frame of raw image data captured using one or more image sensors, in that the raw image data patch includes a single color component for each pixel in the pixel subset; applying at least one neural network to the raw image data patch to determine a plurality of color component values for one or more pixels in the pixel subset; and generate an output image data patch based on applying at least one neural network to the raw image data patch, the output image data patch includes a subset of pixels from an image data frame output and includes the plurality of color component values for one or more pixels in the subset of pixels of the output image data frame, where the application of at least one neural network causes the image data patch to output includes fewer pixels than the raw image data patch.
[14]
Apparatus according to claim 13, wherein the raw image data frame includes image data from the one or more image sensors filtered by a color filter matrix.
[15]
Apparatus according to claim 14, wherein the color filter matrix includes a Bayer color filter matrix.
[16]
An apparatus according to claim 13, wherein applying at least one neural network to the patch of raw image data includes: applying one or more convolutional filters with stride to the patch of raw image data to generate reduced resolution data representative of the raw image data patch, with each stride convolutional filter of one or more stride convolutional filters including a weight matrix.
[17]
Apparatus according to claim 16, wherein each stride convolutional filter of the one or more stride convolutional filters includes a plurality of channels, each channel of the plurality of channels includes a different weight matrix.
[18]
Apparatus according to claim 16, wherein the one or more stride convolutional filters include a plurality of stride convolutional filters, the plurality of stride convolutional filters including: a first stride convolutional filter that has a first weight matrix, in which the application of the first stride convolutional filter to the raw image data patch generates a first set of weighted data representative of the raw image data patch, the first weighted data set having a first resolution ; and a second stride convolutional filter that has a second weight matrix, where the application of the second stride convolutional filter generates a second set of weighted data representative of the raw image data patch, the second set of weighted data having a second resolution that is less than the first resolution.
[19]
An apparatus according to claim 18, wherein the processor is additionally configured to: scale up the second weighted data set having the second resolution to the first resolution, and generate combined weighted data representative of the image data patch gross by combining the second upwardly scaled set of weighted data with the first weighted data set that has the first resolution.
[20]
20. Apparatus according to claim 19, in which the processor is additionally configured to: apply one or more convolutional filters to the combined weighted data to generate resource data representative of the raw image data patch, with each convolutional filter of the one or more convolutional filters includes an array of weights.
[21]
21. Apparatus according to claim 20, wherein the processor is further configured to: upwardly scale the resource data to a full resolution, and generate combined resource data representative of the raw image data patch by combining the data upwardly scaled resource data with full resolution resource data, with full resolution resource data generated by applying a convolutional filter to a full resolution version of the raw image data patch.
[22]
22. Apparatus according to claim 21, wherein generating the output image data patch includes: applying a final convolutional filter to the resource data or the combined resource data to generate the output image data.
[23]
23. Apparatus according to claim 13, wherein the processor is additionally configured to:
obtain additional data to augment the obtained raw image data patch, the additional data including at least one or more of the tone data, radial distance data or automatic white balance (AWB) gain data.
[24]
Apparatus according to claim 13, wherein the at least one neural network includes a plurality of layers, and wherein the plurality of layers is connected to a high dimensional representation of the raw image data patch.
[25]
25. Apparatus according to claim 13, further comprising a camera for capturing images.
[26]
26. Non-transitory computer-readable medium that has stored instructions that, when executed by one or more processors, cause the one or more processors to: obtain a patch of raw image data, the patch of image data raw includes a subset of pixels from a raw image data frame captured using one or more image sensors, where the raw image data patch includes a single color component for each pixel in the pixel subset; apply at least one neural network to the raw image data patch to determine a plurality of color component values for one or more pixels in the pixel subset; and generate an output image data patch based on applying at least one neural network to the raw image data patch, the output image data patch includes a subset of pixels from an image data frame output and includes the plurality of color component values for one or more pixels in the subset of pixels of the output image data frame, where the application of at least one neural network causes the image data patch to output includes fewer pixels than the raw image data patch.
[27]
27. Non-transitory computer-readable medium according to claim 26, wherein the raw image data frame includes image data from the one or more image sensors filtered by a color filter matrix.
[28]
28. Non-transitory computer-readable medium according to claim 26, in which applying at least one neural network to the raw image data patch includes: applying one or more stride convolutional filters to the raw image data patch for generate reduced resolution data representative of the raw image data patch, with each stride convolutional filter one or more stride convolutional filters including a weight matrix.
[29]
29. Non-transitory computer-readable medium according to claim 28, wherein each stride convolutional filter of the one or more stride convolutional filters includes a plurality of channels, wherein each channel of the plurality of channels includes a different weight matrix .
[30]
30. Non-transitory, computer readable medium,
according to claim 28, wherein the one or more stride convolutional filters include a plurality of stride convolutional filters, the plurality of stride convolutional filters including:
a first convolutional filter with stride that has a first matrix of weights, in which the application of the first convolutional filter with stride to the patch of raw image data generates a first set of weighted data representative of the patch of raw image data, being that the first weighted data set has a first resolution; and a second stride convolutional filter that has a second weight matrix, where the application of the second stride convolutional filter generates a second set of weighted data representative of the raw image data patch, the second set of weighted data having a second resolution that is less than the first resolution.
类似技术:
公开号 | 公开日 | 专利标题
BR112020006869A2|2020-10-06|image signal processor for image processing
US9639945B2|2017-05-02|Depth-based application of image effects
CN103973958B|2018-04-03|Image processing method and equipment
US9305375B2|2016-04-05|High-quality post-rendering depth blur
CN101543056B|2011-04-20|Image stabilization using multi-exposure pattern
US10614603B2|2020-04-07|Color normalization for a multi-camera system
CN106797437A|2017-05-31|High dynamic range images using multiple images synthesize
WO2018196704A1|2018-11-01|Dual-core focusing image sensor, focusing control method thereof, and imaging device
US20100208104A1|2010-08-19|Image processing apparatus, imaging apparatus, image processing method, and program
US20090268984A1|2009-10-29|Subpixel Registration
US11263782B2|2022-03-01|Image signal processor for processing images
GB2536904A|2016-10-05|Image filtering based on image gradients
CN105812649A|2016-07-27|Photographing method and device
US10579908B2|2020-03-03|Machine-learning based technique for fast image enhancement
CN107466473A|2017-12-12|More array camera imaging systems and its method
CN104010134B|2017-08-11|For forming the system and method with wide dynamic range
CN104683779B|2017-08-29|AWB compensation method and the device using this method
EP3247107B1|2018-12-19|Method and device for obtaining a hdr image by graph signal processing
RU2018113713A|2019-10-18|LIGHT FIELD DATA REPRESENTATION
CN105578077A|2016-05-11|Imaging method, imaging device and electronic device
CN109242793A|2019-01-18|Image processing method, device, computer readable storage medium and electronic equipment
CN109598673A|2019-04-09|Image split-joint method, device, terminal and computer readable storage medium
CN102447833B|2014-09-24|Image processing apparatus and method for controlling same
CN108122218A|2018-06-05|Image interfusion method and device based on color space
Buzzelli et al.2021|Consensus-driven illuminant estimation with GANs
同族专利:
公开号 | 公开日
CA3075544A1|2019-04-18|
US20190108618A1|2019-04-11|
US10643306B2|2020-05-05|
CN111194458A|2020-05-22|
WO2019074804A1|2019-04-18|
EP3695375A1|2020-08-19|
AU2018346909A1|2020-03-26|
KR20200058440A|2020-05-27|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

KR100200215B1|1996-04-08|1999-06-15|윤종용|Soldering detection apparatus & method thereof using corelated neural network|
US6594382B1|1999-11-04|2003-07-15|The United States Of America As Represented By The Secretary Of The Navy|Neural sensors|
US6735579B1|2000-01-05|2004-05-11|The United States Of America As Represented By The Secretary Of The Navy|Static memory processor|
CA2575859A1|2004-08-11|2006-02-23|Aureon Laboratories, Inc.|Systems and methods for automated diagnosis and grading of tissue images|
US8077958B2|2006-06-30|2011-12-13|University Of South Florida|Computer-aided pathological diagnosis system|
US9626766B2|2014-02-28|2017-04-18|Microsoft Technology Licensing, Llc|Depth sensing using an RGB camera|
US20160239706A1|2015-02-13|2016-08-18|Qualcomm Incorporated|Convolution matrix multiply with callback for deep tiling for deep convolutional neural networks|
WO2016207875A1|2015-06-22|2016-12-29|Photomyne Ltd.|System and method for detecting objects in an image|
US9633282B2|2015-07-30|2017-04-25|Xerox Corporation|Cross-trained convolutional neural networks using multimodal images|
US10460231B2|2015-12-29|2019-10-29|Samsung Electronics Co., Ltd.|Method and apparatus of neural network based image signal processor|
US9760827B1|2016-07-22|2017-09-12|Alpine Electronics of Silicon Valley, Inc.|Neural network applications in resource constrained environments|US10360470B2|2016-10-10|2019-07-23|Gyrfalcon Technology Inc.|Implementation of MobileNet in a CNN based digital integrated circuit|
US10489887B2|2017-04-10|2019-11-26|Samsung Electronics Co., Ltd.|System and method for deep learning image super resolution|
US10366328B2|2017-09-19|2019-07-30|Gyrfalcon Technology Inc.|Approximating fully-connected layers with multiple arrays of 3x3 convolutional filter kernels in a CNN based integrated circuit|
US11263782B2|2017-10-11|2022-03-01|Qualcomm Incorporated|Image signal processor for processing images|
US10945033B2|2018-03-14|2021-03-09|Idomoo Ltd.|System and method to generate a customized, parameter-based video|
US10636123B2|2018-05-15|2020-04-28|Wipro Limited|Method and device for de-noising images|
US10803565B2|2018-07-10|2020-10-13|Intel Corporation|Low-light imaging using trained convolutional neural networks|
US10904637B2|2018-12-17|2021-01-26|Qualcomm Incorporated|Embedded rendering engine for media data|
US10997690B2|2019-01-18|2021-05-04|Ramot At Tel-Aviv University Ltd.|Method and system for end-to-end image processing|
US11126895B2|2019-05-22|2021-09-21|Lawrence Livermore National Security, Llc|Mimicking of corruption in images|
DE102019209282A1|2019-06-26|2020-12-31|Robert Bosch Gmbh|Method for obtaining an infrared representation of an image situation and computing unit, system and computer program for its implementation|
US10614635B1|2019-07-25|2020-04-07|Capital One Services, Llc|Augmented reality system with color-based fiducial marker|
US11049286B2|2019-07-31|2021-06-29|Hewlett Packard Enterprise Development Lp|Deep neural network color space optimization|
DE102019129760A1|2019-11-05|2021-05-06|Eyyes Gmbh|Procedure for processing input data|
US20210287083A1|2020-03-13|2021-09-16|Microsoft Technology Licensing, Llc|System and method for improving convolutional neural network-based machine learning models|
US20210360179A1|2020-05-13|2021-11-18|Qualcomm Incorporated|Machine learning based image adjustment|
法律状态:
2021-11-23| B350| Update of information on the portal [chapter 15.35 patent gazette]|
优先权:
申请号 | 申请日 | 专利标题
US201762571182P| true| 2017-10-11|2017-10-11|
US62/571,182|2017-10-11|
US15/993,223|2018-05-30|
US15/993,223|US10643306B2|2017-10-11|2018-05-30|Image signal processor for processing images|
PCT/US2018/054764|WO2019074804A1|2017-10-11|2018-10-05|Image signal processor for processing images|
[返回顶部]